An Adaptive Index Structure for High-Dimensional Similarity Search

نویسندگان

Peng Wu

B. S. Manjunath

Shivkumar Chandrasekaran

چکیده

A practical method for creating a high dimensional index structure that adapts to the data distribution and scales well with the database size, is presented. Typical media descriptors, such as texture features, are high dimensional and are not uniformly distributed in the feature space. The performance of many existing methods degrade if the data is not uniformly distributed. The proposed method offers an efficient solution to this problem. First, the data’s marginal distribution along each dimension is characterized using a Gaussian mixture model. The parameters of this model are estimated using the well known ExpectationMaximization (EM) method. These model parameters can also be estimated sequentially for on-line updating. Using the marginal distribution information, each of the data dimensions can be partitioned such that each bin contains approximately an equal number of objects. Experimental results on a real image texture data set are presented. Comparisons with existing techniques, such as the well known VA-File, demonstrate a significant overall improvement.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Classification of High Dimensional Indices for Spatial Data Similarity Search

The applications of spatial data similarity search are increasingly needed nowadays, and accordingly high dimensional index becomes one key technology to solve the problem of spatial data similarity search. Firstly, the distribution of high dimensional data is in-depth analyzed, and then high dimensional data retrieval for spatial data similarity search is also discussed. Secondly, based on the...

متن کامل

An Adaptive Multi-level Hashing Structure for Fast Approximate Similarity Search

Fast information retrieval is an essential task in data management, mainly due to the increasing availability of data. To address this problem, database researchers have developed indexing techniques to logically organize elements from large datasets in order to answer queries efficiently. In this context, an approximate similarity search algorithm known as Locality Sensitive Hashing (LSH) was ...

متن کامل

MLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions

High-dimensional indexing has been very popularly used for performing similarity search over various data types such as multimedia (audio/image/video) databases, document collections, time-series data, sensor data and scientific databases. Because of the curse of dimensionality, it is already known that well-known data structures like kd-tree, R-tree, and M-tree suffer in their performance over...

متن کامل

The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces

Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing...

متن کامل

The GC-tree: a high-dimensional index structure for similarity search in image databases

With the proliferation of multimedia data, there is an increasing need to support the indexing and retrieval of high-dimensional image data. In this paper, we propose a new dynamic index structure called the GC-tree (or the grid cell tree) for efficient similarity search in image databases. The GC-tree is based on a special subspace partitioning strategy which is optimized for a clustered high-...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

An Adaptive Index Structure for High-Dimensional Similarity Search

نویسندگان

چکیده

منابع مشابه

The Classification of High Dimensional Indices for Spatial Data Similarity Search

An Adaptive Multi-level Hashing Structure for Fast Approximate Similarity Search

MLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions

The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces

The GC-tree: a high-dimensional index structure for similarity search in image databases

عنوان ژورنال:

اشتراک گذاری